301 research outputs found
CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
We present our submitted systems for Semantic Textual Similarity (STS) Track
4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must
estimate their semantic similarity by a score between 0 and 5. In our
submission, we use syntax-based, dictionary-based, context-based, and MT-based
methods. We also combine these methods in unsupervised and supervised way. Our
best run ranked 1st on track 4a with a correlation of 83.02% with human
annotations
Deep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection
methods on a new recently introduced open dataset, which contains parallel and
comparable collections of documents with multiple characteristics (different
genres, languages and sizes of texts). We investigate cross-language plagiarism
detection methods for 6 language pairs on 2 granularities of text units in
order to draw robust conclusions on the best methods while deeply analyzing
correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable
Corpora) colocated with ACL 201
ANT COLONY ALGORITHM APPLIED TO AUTOMATIC SPEECH RECOGNITION GRAPH DECODING
International audienceIn this article we propose an original approach that allows the decoding of Automatic Speech Recognition Graphs by using a constructive algorithm based on ant colonies. In classical approaches, when a graph is decoded with higher order language models; the algorithm must expand the graph in order to develop each new observed n-gram. This extension process increases the computation time and memory consumption. We propose to use an ant colony algorithm in order to explore ASR graphs with a new language model, without the necessity of expanding it. We first present results based on the TED English corpus where 2-grams graph are decoded with a 4-grams language model. Then, we show that our approach performs better than a conventional Viterbi algorithm when computing time is constrained and allows a highly threaded decoding process with a single graph and a strict control of computation time and memory consumption
Modelling, Detection And Exploitation Of Lexical Functions For Analysis.
Lexical functions (LF) model relations between terms
in the lexicon. These relations can be knowledge about
the world (Napoleon was an emperor) or knowledge about
the language (‘destiny’ is synonym of ‘fate’)
Lexical Functions For Ants Based Semantic Analysis.
Semantic analysis (SA) is a central operation in natural language processing. We can consider it as the resolution of 5 problems: lexical ambiguity, references, prepositional attachments, interpretative paths and lexical functions instanciation
Ant Colony Algorithm Applied to Automatic speech Recognition Graph Decoding
International audienc
Extension lexicale de définitions grâce à des corpus annotés en sens
International audienceLexical Expansion of definitions based on sense-annotated corpus For many natural language processing tasks and applications, it is necessary to determine the semantic relatedness between senses, words or text segments. In this article, we focus on a knowledge-based measure, the Lesk measure, which is certainly among the most commonly used. The similarity between two senses is computed as the number of overlapping words in the definitions of the senses from a dictionary. In this article, we study the expansion of definitions through the use of sense-annotated corpora. The idea is to take into account words that are most frequently used around a particular sense and to use the top of the frequency distribution to extend the corresponding definition. We show better performances on a Word Sense Disambiguation task surpassing state-of-the-artPour un certain nombre de tâches ou d'applications du TALN, il est nécessaire de déterminer la proximité sémantique entre des sens, des mots ou des segments textuels. Dans cet article, nous nous intéressons à une mesure basée sur des savoirs, la mesure de Lesk. La proximité sémantique de deux définitions est évaluée en comptant le nombre de mots communs dans les définitions correspondantes dans un dictionnaire. Dans cet article, nous étudions plus particulièrement l'extension de définitions grâce à des corpus annotés en sens. Il s'agit de prendre en compte les mots qui sont utilisés dans le voisinage d'un certain sens et d'étendre lexicalement la définition correspondante. Nous montrons une amélioration certaine des performances obtenues en désambiguïsation lexicale qui dépassent l'état de l'art
Sense Embeddings in Knowledge-Based Word Sense Disambiguation
International audienc
- …